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Square root calculation is a widely used task in real-time control systems 
especially in those, which control power electronics: motors drives, power 
converters, power factor correctors, etc. At the same time calculation of 
square roots is a bottle-neck in the optimization of code execution time. 
Taking into account that for many applications approximate calculation of a 
square root is enough, calculation time can be decreased with the price of 
precision of calculation. This paper analyses existing methods for fast square 
root calculation, which can be implemented for fixed point microcontrollers. 
It discusses algorithms’ pros and cons, analyses calculation errors and gives 
some recommendations on their use. The paper also proposes an original 
method for fast square root calculation, which does not use hardware 
acceleration and therefore, is suitable for implementation at a variety of 
moder Digital Signal Processors, which have high-speed hardware 
multipliers, but do not have effective dividers. The maximum relative error 


of the proposed method is 3.36% for calculation without division, and can be 
decreased to 0.055% using one division operation. Finally, the most 
promising methods are compared and results of their performance 
comparisons are depicted in tables. 
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1. INTRODUCTION 

Modern control systems of power electronics have extended functionality and large code to be 
executed. At the same time the most of mass produced devices are subject for cost optimization, therefore 
developers frequently select cheap microcontrollers, which are not powerful. The bulk of the code including 
math functions and model of control objects are run every sampling period of the system, which is typically 
equal to modulation period. The modulation period of the majority of power electronics control systems lies 
in the interval of 4 - 20 kHz, while operating frequency of the cost-effective microcontroller belong to the 
range of 40-80 MHz, which gives 2,000-20,000 processor cycles per modulation period. High power 
devices operate at lower modulation frequencies, while lower power electronics and devices with lower 
inductances, which are widely used in home appliances, automotive and industry, operate at higher 
modulation frequencies. The price of high power electronics is also high, so increase of microcontroller’s 
cost in 1 - 2 $ does not impact significantly the total price of device, but the cost of the low power units is 
low, and the pressure form competitors make every cent significant. So developers of power electronics put 
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cheap Digital Signal Processors (DSP) in their solutions and then try to optimize software to be able run in 
the selected DSP. One of the main milestones in the way of optimization is square root calculation routine. 

Square root operation is frequently used in many control systems of power electronics and digital 
signal processing algorithms. Tasks with square roots can be divided into two groups. Tasks from the first 
group are executed every sampling period and process newly sampled data. Tasks from the second group are 
executed rarely (with a frequency less than 120 Hz), when measurement results are needed and process data 
from multiple samples. First group includes voltage and current vector transformations and normalizations, 
models of control objects, observers, etc. Second group includes calculation of Root Mean Squares (RMS) 
and Total Harmonic Distortion (THD), least squares based algorithms, etc. Square root operation is also 
intensively used in electrical drives, where it is involved into motor models calculation, Maximum Torque 
Per Ampere (MTPA) control [1], Maximum Torque Per Voltage (MTPV) control, field weakening [2], 
various observers, Power Factor Correctors (PFC), etc. 

Authors of this paper faced with the same problem, which arose, when 80 MHz DSP was substituted 
with 64 MHz DSP and software operating at 16 kHz failed due to the MCU overload. Analysis of the 
software revealed several bottle-necks, one of which was square root calculation routine called several times 
per sampling period and two times in the 60 Hz interrupt. At the same time implementation of the square root 
calculation routine used digit-by-digit method, which was not perfect and took a lot of processor cycles. 
Therefore, authors concentrated their efforts on the optimization of time used for square root calculation. 

Let's formulate a problem for the discussion. For a given number S, it is necessary to find the 
number X so that: 


VS x X. (1) 


Modern DSPs for cheap control systems of power electronics are typically 16 or 32-bit with 10 or 
12-bit ADC. Therefore, majority of variables in control algorithms are 16-bit and intermediate results of 
calculations are 32-bit variables. Consequently, X is expected to be 16-bit and S is expected to be 32-bit. 

The overwhelming majority of DSPs are designed for fast multiplication with addition, but has no 
hardware acceleration for the division operation. Consequently, implementation of the division operation is 
slow and typically takes as many processor cycles as number of bits of the result. For a 16-bit result, one 
division operation takes about 16 cycles and significantly slows down calculations. Thus, it is desired to 
minimize the number of divisions in the square root calculation algorithm. 

In such a way our purpose is to accelerate the square root calculation with the price of decreased 
tolerance. As square root operation is typically used for the processing of measured signals corrupted by 
noise, calculation errors should be similar to the measuring errors, which are typically 0.5 — 5%. However 
some cases demand higher precision, so calculation methods must be able to decrease errors by running one 
to two additional iterations. Therefore, the calculation method must quickly calculate the square root with 
errors in the range of 0.5 — 5% and must have fast convergence to be able to decrease errors in one to two 
additional iterations. 

Basically, the algorithms for square root evaluation can be divided into two main groups: iterative 
methods and approximation by real functions. Iterative methods comprise three classes: direct methods, 
algorithms based on the Newton-Raphson formula and normalization techniques [3]. 

Authors of [4-13] reported hardware implementation of non-iterative methods. These algorithms are 
mainly designed for low-resolution data (8-bit), but operate extremely fast and can be used for pre-processing 
of sampled data Such methods need additional hardware and cannot be used in general DSP-based 
algorithms. Authors of [14] used parallel calculation in reported speed increase in more than 38%. 

The uncommon approach for square root calculation was reported in [16]. Authors proposed to use 
iterative execution of the nonlinear Infinite Impulse Response (IIR) filter to obtain the square root of the 
given number. This method does not involve division operation, but intensively uses multiplication with 
addition operations. 

This paper is divided into five sections. In Section II the authors explain existing methods for the 
fixed point square root calculation, and discuss their pros and cons. Section III proposes a new method and 
discusses its tolerance. Experimental results and performance of the most promising methods are given in the 
Section IV. Section V contains conclusions. 


2. EXISTING METHODS 

The authors of [3] give classification and extensive review of the general square root calculation 
methods, but not all of them are suitable for fixed point implementation. Paper [3] can be extended with [15] 
and [17], which review and analyse only fixed point algorithms. 
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There are several square root calculations methods, which can be considered to be fast. Some of 
them utilize hardware features of processor core and can be implemented in the limited number of DSPs, 
while others are general methods, which do not depend on processor core. This paper reviews the most 
prospective methods and discusses their pros and cons. 


2.1. Digit-by-digit calculation 

This method for square root calculation at fixed point DSPs, is the most popular as it is easy to 
understand and implement. Despite relatively slow convergence, many engineers still pay attention to this 
method, often implementing it in hardware [18] and software [19]. This method can be considered as a basic 
method and can be used for the evaluation of other methods. 

For better understanding the algorithm implementation in binary system, let’s consider its operation 
with decimal numbers, which is often used for manual calculation of square roots with paper and pencil. 

Suppose Xx is the k-th digit of X, so: 


X =X, +10" +X, 104+... +X, -10 + Xp (2) 


The most significant digit Xx is the first approximation of the square root X and can be easily found 
as the highest integer which satisfies: 


X7-107* <S (3) 
The next digit X;.; of square root X can be found using the following inequality: 

S > (Xp 10% + Xp- < 10*-1)2 (4) 
which transforms into: 

S — XÈ - 10?¥ > X,_4 -10?4-Y(20 - Xk + Xk-1) (5) 


Similarly, the next digit can be calculated as: 


S-| X210 +(20X, +X,1)X 1:10 10% >| 2010X, +X, |+ Xr l Xr 10? 
cAr ki 


Leftmost pair Next pair subtrahend Root at previous step New digit} New digit 
subtrahend 





(6) 


This approach can be used recursively during the next steps to obtain the necessary number of 
digits. Since every digit of the result X is multiplied by 102k, it’s easier to separate S into pairs of digits and 
calculate Xk for every pair. If S contains odd number of digits, a 0 must be added to the left. Thereafter, 
every pair is combined with previous step remainder and new digit satisfying: 


R > (20 - Xp + Xk-1)Xk-1 (7) 


must be found. 

Let’s denote root calculated at the current step as “R” and remainder of the initial value as “Rem”. 
R is initialized with zero, and every iteration of the algorithm calculates the next digit of the root. 
The flowchart of the algorithm is shown in Figure 1: 

Let’s illustrate this algorithm with the following example: The square root of 54756 is: 
So answer is 234. 

This method can be extended to binary numbers. A binary system is simpler for calculation because 
the greatest number can be found just by comparing numbers. The only difference is that condition (7) 
transforms into: 
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4054756 \ Given number 


The first digit is 2 because 





moe ; ; 
04 EaR 
The second digit is 3 because 
_147 
(20-2+3):-3<147<(20-2+4)-4 
129 Nay — 
129 176 
The third digit is 4 because 
1856 
“Tese f(20:23+4)-4<51856<(20-2345)-5 
1856 2325 
R > (100, : Xk + Xk-1)Xk-17 (8) 


because the radix number changed from 10 to 2. 
The next example shows root calculation in the binary system. 


100010100001, | Given number (2209, ) ( Begin ) 


























10 First 2a is 1 because 
g OL <10 Separate S into pair of digits. 
p Add “0” t the left if necessary 
E Second me is 0 because 
tae (100, -1,+1,)1, >100, y 
000 k O b e LA) R=0; 
LO Rem = Leftmost pair; 
Third digit is 1 because 
eee 100,-10,+1,)1, $1001, 
01001 (100,10, +1, )ty < 
1001, Find greatest digit X such that 
Fourth digit is 1 because X- (20-R + X) < Rem 
zo 100,:101,+1,)1, <10011 
010101 (100, 0 pt b/~b 00 Q, 
tear R=10-R+X 


Fifth digit is 1 because 
(100,-1011, +1,)l, <100010g 


AS aE Se A 
101101, 


0101101 


Sixth digit is 1 because 
1011101 100,-10111, +1, )l, <1011101, 


1011101 1011101 , 
Rem = Rem: 100 + Next pair; 


So, result is 101111b = 47d 














Í 
omo Y 
P 











Figure 1. Flowchart of digit-by-digit method 


This method does not use specific hardware and its execution time does not significantly depend on 
the processor type. The advantages of this method are precise results (LSB or half LSB if remainder is 
analysed), simple implementation and absence of division. It calculates root digit by digit, starting from the 
most significant, so calculations can be stopped before processing of full number S, if acceptable tolerance 
is reached. 

However, the most significant disadvantage of the explained method is calculation time. Despite 
this, digit-by-digit method is frequently used in the control systems, where the processor is not highly loaded. 
It was also proposed for implementation in hardware and the authors of [19] enhanced this idea to calculation 
of other functions and implemented on FPGA. 


2.2. Polynomial approximation 

Square root function is difficult for approximation because its derivative rapidly changes close to 
zero, which results in high approximation errors. Therefore, square root can be successfully approximated 
only at the limited interval. 
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Polynomial approximation is proposed in [20] and recommended by engineers from Analog Devices 
to use with their DSPs. They propose to approximate square root function f(x) = x0.5 at the interval of 
[0.5..1], with fifth order polynomial: 


Vx ~0.1121216-x° -0.536499- x* +1.106812- x? -1.34491- x? +1.454895 - x +0.2075806 (9) 


Since approximating polynomial gives appropriate results in the limited interval, the initial number 
must be scaled to fall inside it. After calculation of the square root of the scaled number, the resulting value 
must be multiplied by the square root of the scaling value to obtain the correct answer. The authors of [20] 
published assembler code of the proposed method and claimed maximum execution time of 75 cycles. 

This essential scaling operation is not simple for many DSPs and takes time, however processors 
from Analog Devices have hardware units called an exponent detector, which calculates the amount of the 
redundant sign bits and shifts initial numbers, so calculation is accelerated. Moreover, sign operation of the 
exponent detector limits the range of input value by 32768, which is restricting in most cases. Evaluation of 
the high power polynomial increases calculation error and impacts tolerance of the result. So this method is 
not recommended for implementation. 


2.3. Table approximation 

Lookup table-based methods are common for function approximation. They are relatively fast, but 
use a lot of memory to increase tolerance. Bipartite and multipartite methods are excellent examples of table 
lookups and are described in [18]. Authors propose inserting two lookup tables into the data paths and use 
one table for the initial seeds, while another one should be used for adding a small correcting value. 

The approximation error can also be reduced by interpolation between table points (e.g. linear), 
but it increases the calculation time and diminishes the main advantage of the method - speed. The flowchart 
of function approximation with variable segments table and linear interpolation between points is depicted in 
Figure 2. It indicates that calculation contains one division, which significantly slows down calculation. 

The calculation time of the function approximated with table can be significantly decreased, if the 
table consists of equal segments and a number of them is to the power of two. A flowchart of the 
corresponding algorithm is depicted in Figure 3. It indicates that calculation was simplified and division had 
been excluded. 

Table approximation perfectly works for the smooth function with smooth derivative and outputs 
higher errors for the function with rapidly changing derivatives. A typical example of functions, which can be 
approximated easily, is sine and cosine. 

As stated above, square root is not easy for approximation, because its derivative rapidly changes 
close to zero. Therefore, variable step table approximation or pre-scaling must be used. For this specific 
reason, Figure 3 contains two scaling blocks. 

On the one hand, the tolerance of this method mainly depends on the size of the table and in some 
cases the size is unacceptably large. On the other hand, this method needs pre-scaling before table is read and 
after, which diminishes its main advantage - calculation speed. 


2.4. Newton’s method 

This method is very popular for numerical calculation because it is simple and has fast convergence. 
It can be successfully used for the approximation of various functions and calculation operations [21]. Let’s 
consider its application to the calculation of square roots. 

Finding X, which is square root of S is the same as solving: 


x?-S=0 (10) 
According to Newton’s method, the next step approximation xi+1 of the function root is: 


— >, _ fi) 
xm = EE a1) 
where xi is root approximation at the current step. The calculation process can be illustrated by drawing 


series of secant lines to parabola as shown in Figure 4. The iterative calculation is stopped, when the 
necessary accuracy has been reached: 
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Scale S to be in the range [0.5..1]. 
K is exponent (numter of shifted bits) 


ý 


Define table index N | 








Determine table segment, i.e. define table 
index N, so that 
SEN] < S < S[N+1] 





N = S >> (32 - TblSize) 


ý 


Read data from table: | an a | 
XIN], X[N+1], SIN], SIN+1] q 














santa liner approximation: 
]+(x[w +1]- x[v)- 


e & ae —1))>> TblSize 








Perform liner approximation: 





(x[n +1]- x[v]Xs - six) 














eS ore] ae 
PUT Scale resulting value: 
X=X.- V2“ 
End 
Figure 2. Flowchart of function approximation with Figure 3. Flowchart of square root approximation 
variable segments table and linear interpolation with equally segmented table with size of 27° and 
between points linear interpolation between points 

lin — xi SE (12) 


However, for simpler calculation, the iterative process can be limited to fixed number of iterations. 
Thus, the function given in (10), using (11) transforms into: 


2g 1 s 
xm = xy EE 2 (x, +=) (13) 


2X; Xi 





This equation is also known as the Babylonian method [22]. It has quadratic convergence and allows 
us to calculate square root with low number of operations. However, each iteration of the method contains 
division, which takes a significant amount of processor time 

Properly selected initial value can significantly accelerate calculation, so an initial guess x0 is very 
important. We suggest to locate square root by finding n, so that square root belongs to the interval: 


INEX LE (14) 
This step can be easily implemented in the DSPs and provides advantage of using the power of two, therefore 
division at the first step can be substituted with bit shift. Then the initial value can be selected during this 
interval, using any criteria. Typical initial values are middle point of the interval: 


Xo = 3-2" (15) 


which is the most probable number, and value, which gives symmetrical relative error at 
interval borders: 





Xp a 4 a 2” 
3 (16) 
Using (15) as an initial value and combining with (13) results: 
1 s 
x =}( =) =3- gn? 4 (17) 


where division can be substituted by the bit shift. However, the following steps involve division and are 
performed according to (13). 
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Let’s calculate the error for every iteration in order to define the number of required steps. 
The relative error is higher, when X lies at the interval’s lower border: X=2n. 
Relative error of initial approximation: 


x 3-2071 
Ôo = aues 1 = 


X an 





al 
-1=+ (18) 


First step approximation: 


2n n-2 n n 
452, 2 9-207242 13-2 
=g p 19 
T 3-20 3 12 (19) 





Relative error of the first step approximation: 


6 2S SE op Se 83% (20) 


X ~ 122% 12 





Second step approximation: 




















113-2" 12:2?” 313-2” 
%2 = A 12 = ~ 312 (21) 
Relative error of the second step approximation: 
_ X2 _ 313-2" ee eer 
62 = qt a 032A (22) 
Third step approximation: 
1 313-2" | 312-227 195313-2” 
x3 =+( aa (23) 
2\ 312 313-2 195312 
Relative error of the third step approximation: 
= X3 = = 1 x A -6 0 
63 = ap Sy ea 5.1 -107° % (24) 


This tolerance is typically sufficient, and square root calculation needs three iterations, 
which involve only two divisions. 








Figure 4. Geometrical interpretation of the Newton’s method 


2.5. A two-variable iterative method 

This method was proposed by the authors of [23] for one of the first computers. It is applicable to 
the square root calculation of the number S satisfying 0 < S <3, so it also needs scaling of the initial number. 

This method calculates two numbers a and c at every iteration, where a converges to square root of 
S and c converges to 0. 

These numbers are initialized as: 
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a =S; cy=S-1 25) 
and at every iteration they are updated as: 


Qj41 = Ai 0.5- ajCj 
Ci41 = 0.25 - c? (c; — 3) (26) 


It should be noted that for faster calculation S can be scaled to satisfy 0.5 < S <2. 
The maximum relative error occurs when S =2. Maximum relative errors for each calculation 
step are: 


by) = V2 — 1 41.4%; A 1 29% 
i Ta ci . 0, H eo v2 aad 0 
ô = i 1 = -11.6%; 53 =~ —1.94% 
5, = —0.056% (27) 


This method, as Newton’s method, has quadratic convergence and typically 3 — 4 iterations of (26) 
is sufficient. This calculation method does not contain division operation and time delays inherent to it. 

The problem of this method is multiplication of 32-bit numbers, which is typically slower than 
16-bit number multiplication. It also truncates 64-bit numbers to 32-bit, producing calculation errors, which 
tend to be accumulated. 


2.6. Goldschmidt’s algorithm 

One more interesting and prospective method is Goldschmidt’s algorithm. Its usual description 
reflects a hardware orientation with difficulties of implementation in software, however the author of [24] 
suggests a software-friendly way of computing, which is described below. 


1/45 


Let y0 be a suitably good approximation to , such as: 

Le gay? <? 

[= S- y S 3 (28) 
Initialize variables: 

Xo = S ` Yo 


And then iterate as follows: 


Naa = 0.5 — Xi-1 ya 
Xi = Xj-1 + Xi-1 ` N-1 (30) 
hi = hi- + hi-1 ' fi-1 


Calculation can be stopped, when ri is sufficiently low or after fixed number of iterations. 
Calculated values converge to: 


limx; = VS 


i> 


T 
i>% Vs (31) 


The maximum relative error occurs when, 
1 


Yo = oe (32) 


And maximun relative errors for each calculation steps are: 
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1 5 
— -1 x —29%; 6; = — — 1 x —11.6% 





g= 
Mg, AN2 
ô ==- 1x 1.94%; ôs = —0.056% (33) 


The author of [24] proves that Goldsmith’s algorithm, as Newton’s method has quadratic 
convergence, but has an advantage of absence of division operation. The feature of the algorithm is that it 
simultaneously calculates square root and reciprocal square root. However, if one of these values is not 
needed, the corresponding equation in (30) can be excluded. 

Equations in (30) can be effectively implemented for large numbers of DSPs, which have hardware 
units for multiplication with addition. However, it should be noted that variables in (30) have significantly 
different orders, therefore a lot of attention must be paid to their representation. 

The disadvantage of this method is the necessity of initial approximation (28), which is quite 
difficult. The paper [24] suggests using small lookup tables for this purpose, but it takes additional memory. 
Another drawback of this method is propagation of errors due to rounding. The author of [24] claims that 
high precision results are usually achieved by starting with Goldschmidt’s algorithm and then switching to 
the self-correcting Newton’s iterations. However, for our purpose the tolerance of original algorithm is 
sufficient. 


3. PROPOSED METHOD 

After analysis of advantages and disadvantages of the discussed methods, the authors proposed a 
combined method, which is far superior to other algorithms. 

Authors propose to define an initial root interval as described in (14). Then, initial approximation 
can be made by drawing the secant between the ends of the root interval and calculation of secant intersection 
with abscissa axis, Figure 5. 

The equation of the secant line drawn over two points [x;, y/] and [x2, y2] is: 


_ ¥1—-y2 Y2X1—-V1%X2 
A tt (34) 


Its root can be found as: 


= yas a9 


x 


Assuming x; = 2”, x2 = 2"*/ and combining (10) with (35) results initial guess: 





_ (22%=s)2mt1_(22n42_5)25 _ a Pat J 
Xo = (22n_5)—(22n+2_s) T3 2 + Qn (36) 
Let’s calculate the maximum error Axmax. From Figure 6 error function for the approximation with secant is: 


x2 gnt1 





5 — „ — I fon+1 x? = 
Ax(x) =X) -x = > (2 + =) OS Gg AEs (37) 
It has maximum at the point: 
oe nl 
Arne OP (38) 
And maximum absolute error is: 
AX mx ae, 
12 (39) 














Xx 3-2"" 18 (40) 
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As can be seen from (40), the error is negative, so initial guess xo can be improved by adding shift. 
Xo = span + =) +A (41) 


Ay 














Figure 5. Geometrical interpretation of the Figure 6. Error of the approximation 
proposed method 


In order to have equal positive and negative absolute errors, offset value A should be 1/24-2n and 
(41) transforms into: 


Xo -2{2" + s) a Haira +5) 
3 2 24 3 2 (42) 


If we need same values of the relative errors, offset value A should be 0.0336735-2" and (41) 
transforms into: 





Xo = (2m + =) + 0.0336735 -2" = + (1.05051025 nti 4 =) (43) 


And from (40) maximum relative error is: 





5 sae 36m 
AX mx (44) 


This error is almost twice less than the first step error in the Newton’s method. If precision of (44) is 
not sufficient, one more calculation step should be performed. It could be performed according to (13), 
which is simple for implementation, but involves one division operation. 
Let’s find the maximum error, which corresponds to X=2n. 
Initial approximation: 


2n 
xo = ~(1.05051025 - 2”+! +2) = 1.0336735 - 2” (45) 
3 an 


First step approximation: 


22n 


x, = +(1.0336735 - 2" + —= —__ 
2 1.0336735-2 


) = 1.0005485 - 2” (46) 


Relative error of the first step approximation: 


5, = q = HOES _ 1 = 0.055% (47) 


Qn 
Which is sufficient for a wide range of engineering applications. 
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3 
EA 
2 








Figure 7. Error function of the proposed method 


4. EXPERIMENTAL RESULTS 

For the performance analysis, the most prospective algorithms and digit-by-digit method as a 
reference, have been selected. Due to the fact that the proposed algorithm is not iterative, it has been 
enhanced with one iteration of Newton’s method in order to obtain higher precision. 

Calculation time, operations and the number of processor cycles were calculated for the lower 
precision calculation (about 2%) and higher precision calculation (about 0.05%). The results were put into 
Table 1. and Table 2 for the lower and higher precision calculations, respectively. These tables contain main 
operations and do not include data initialization, data movement and branching, which strongly depend on 
the compiler optimization and conveyer mechanism of the microprocessor. The real calculation time of the 
selected methods is typically greater at 8-15 cycles depending on the optimization settings. 

For the experimental results Cortex-M3 based microprocessor has been selected, because this core is 
widely used and does not have specific hardware acceleration. Timings of instruction execution were taken 
from the technical reference [25]. For calculation of execution time we considered the worst case scenarios, 
e.g. according to [25] division operation takes 2-12 cycles, depending on the data, so we used 12 for method 
speed evaluation. 


Table 1. Complexity of calculation methods with lower precision 

















Characteristics Number of Maximum Operations Number of 
Method iterations error +,-,&,1,% <<, >> Compare x + cycles 
Digit-by-digit 16 8-10% % 65 64 17 0 0 146 
Newton-Raphson 2 0.32 % 9 9 4 0 2 46 
Two variables method 3 1.94 % 11 13 4 9 0 37 
Goldsmith’s algorithm 2 1.94 % 11 11 5 7 0 34 
Proposed method 0 2.86 % 6 7 4 1 0 18 
Table 2. Complexity of calculation methods with Higher precision 
Characteristics Number of Maximum Operations Number of 
Method iterations error +,-,&,|,% <<, >> Compare x + cycles 
Digit-by-digit 16 8-10% % 65 64 17 0 0 146 
Newton-Raphson 3 5-10° % 10 10 4 0 3 60 
Two variables method 4 0.056 % 13 16 4 12 0 45 
Goldsmith’s algorithm 3 0.056 % 14 14 5 10 0 43 
Proposed method 1 0.04 % 7 8 4 1 1 32 





All the multiplications were considered as 32-bit operations, which generally take 1 cycle. 
Every multiplication operation is supposed to involve one more shift operation for scaling of the result. 
Some algorithms operate with numbers of different ranges, so they may need 64-bit multiplication, which is 
executed in 3-7 cycles, but this issue was not considered here. 

Initial scaling, which finds n satisfying (14), is performed by four stages of comparison and includes 
maximum 4 comparison, 3 shifts and 4 additions. Initial conditions for Goldsmith’s algorithms are tougher, 
so it needs one more stage and takes 5 comparison, 3 shifts and 5 additions. 

Experimental results prove feasibility of the proposed method and indicate that it is fastest among 
reviewed algorithm. Furthermore, its simplicity makes implementation in hardware possible, resulting in high 
speed with acceptable tolerance. 





Review of fast square root calculation methods for fixed point microcontroller-based control ... (A. Dianov) 


1164 O ISSN: 2088-8694 


5. CONCLUSION 

This paper analyses the existing methods for square root calculations at fixed points DSPs, 
and proposes an original method for the approximate calculus with acceptable tolerance. The maximum 
relative errors for several iterations are calculated, so the number of steps can be selected, depending on the 
demanded tolerance of the result. The complexity and performance of the proposed method and competitive 
methods were both checked and compared. The experimental results indicate significant leadership of the 
proposed method, thus it is recommended for implementation in new algorithms or hardware. 
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